-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mitigate size issues with kselftest kernels #1759
base: main
Are you sure you want to change the base?
Conversation
bf7f211
to
bd4aae3
Compare
Successful builds of the kselftest-slim fragment: https://staging.kernelci.org/build/kernelci/branch/staging-mainline/kernel/staging-mainline-20230227.1/ |
Some selftest-slim configs running: https://staging.kernelci.org/test/plan/id/63fd530c6f12a6da709d58cc/ and one example that didn't finish booting due to an underlying regression in mainline (fix on the way) but which you can see from the logs manages to download and boot the kernel: https://staging.kernelci.org/test/plan/id/63fd56b469ca1c7df59d59a2/ Run that actually runs but skips all the tests due to an issue in test-definitions: https://staging.kernelci.org/test/plan/id/63fd548a130886541b9d58df/ which was the issue here. uImage size for arm64 goes from 53.2Mb to 43.1Mb. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To minimise the extra build load, maybe we could just replace all the kselftest
builds with kselftest-slim
and also enable the full kselftest
builds with the main branches e.g. next/master
, mainline/master
, kselftest/fixes
?
Another, simpler approach is to just disable kselftest on the platforms that can't handle the kernel image size... At least we know that won't have unexpected side-effects.
slim_skip_frags = [ | ||
"tools/testing/selftests/lkdtm/config", | ||
"tools/testing/selftests/cpufreq/config" | ||
] | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about defining fragments in YAML which disable some configs and then have e.g. kselftest+no-lkdtm+no-cpufreq
? That way we don't need to pile up more logic on this kselftest corner-case fragment. All the other fragments are entirely defined in YAML and I think it would be better if we also tried to remove the kselftest special case.
Note: A more longer-term solution I think would be to use the generic way of making the kselftest config with make kselftest-merge
and some way of skipping fragments in the upstream Makefile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be nice but it's much more work than I'm likely to be able to do in the medium term - we don't have the infrastructure to negatively apply config fragments like this. Getting something done upstream is probably interesting too, it's not clear that it'd even be recommended to put all the config fragments in one kernel as standard (for the reasons we're seeing here).
Sure, though I'm not sure how exactly to express that in the filters and since we don't actually run branches called those things in staging it makes it very difficult get any further changes in. I'm not sure how loaded we are for builds at the minute.
That severely cuts down the set of platforms we've got available to us, and/or increases the admin cost for playing with image size options for those platforms. The whole goal here is to avoid having to do that and get more tests run. The resulting kernels running faster is also a win under emulation. |
We can run any build config on staging, just some specially tailored ones are run periodically.
We occasionally hit some congestion. And yes, not knowing precisely how loaded the system is means we may be overloading it by adding more builds.
In a way, yes of course. But if such platforms can't run with the full kselftest build then we're not losing any coverage by disabling them. Then we can start introducing new builds with smaller images and more incrementally extend build load and test coverage. Whichever path we take, it seems unavoidable to try and get some metrics about the additional build costs, whether we can avoid unnecessary builds, and any change in the number of tests run in the labs. |
Right, but doing anything non-standard is really painful - at a minimum you need to send a pull request to kernelci-deploy then wait some indeterminate time to get it deployed which adds a lot of latency and stop energy.
It would probably help to get some progress on using the Kubernetes clusters better but that's all stalled out... I suspect we have idle capacity at the minute.
It's hurting in cases where things add a fragment (like KVM has done, it's running on some of the affected u-boot platforms) if nothing else, and it is generally a pain when it comes to taking advantage of the labs/boards we have.
What would you suggest here? |
No you just start a monitor job on bot.staging.kernelci.org with the build config you want to cover.
Run linux-next on staging, in fact it happens every weekend already so maybe check results on Monday and compare with the production ones to see the increase in build and tests. Getting some metrics from the build clusters and lab usage would be neat but I don't think we have enough tools in place for that right now. |
Which monitor job are you thinking of here? kernel-tree-monitor-next doesn't seem to have done anything for example.
It does? I'm not seeing the results showing up in the UI. |
https://bot.staging.kernelci.org/job/kernel-tree-monitor/build?delay=0sec Enter the build configs in the
OK it hasn't run since 13th January but that's because something is broken: @mgalka @VinceHillier Could you please take a look why the linux-next jobs aren't running? |
Actually since |
This is the starting point, on
|
Now with this change in YAML: diff --git a/config/core/build-configs.yaml b/config/core/build-configs.yaml
index f4a2422fdd6a..759ce711596d 100644
--- a/config/core/build-configs.yaml
+++ b/config/core/build-configs.yaml
@@ -418,13 +418,17 @@ fragments:
- 'CONFIG_IMA=y'
- 'CONFIG_IMA_READ_POLICY=y'
- kselftest:
+ kselftest: &kselftest-fragment
path: "kernel/configs/kselftest.config"
configs:
- '# CONFIG_DUMMY is not set'
- 'CONFIG_NET_IPGRE=m'
- 'CONFIG_NET_IPGRE_DEMUX=m'
+ kselftest-slim:
+ <<: *kselftest-fragment
+ path: "kernel/configs/kselftest-slim.config"
+
preempt_rt:
path: "kernel/configs/preempt_rt.config"
configs:
@@ -711,7 +715,7 @@ build_configs_defaults:
fragments: &default_fragments
- 'debug'
- - 'kselftest'
+ - 'kselftest-slim'
- 'tinyconfig'
architectures: &default_architectures
@@ -748,7 +752,7 @@ build_configs_defaults:
- 'allnoconfig'
- 'defconfig+CONFIG_CPU_BIG_ENDIAN=y'
- 'defconfig+CONFIG_RANDOMIZE_BASE=y'
- - 'defconfig+arm64-chromebook+kselftest'
+ - 'defconfig+arm64-chromebook+kselftest-slim'
- 'defconfig+arm64-chromebook+videodec'
fragments: [arm64-chromebook, crypto, ima, videodec]
@@ -777,7 +781,7 @@ build_configs_defaults:
extra_configs:
- 'allmodconfig'
- 'allnoconfig'
- - 'x86_64_defconfig+x86-chromebook+kselftest'
+ - 'x86_64_defconfig+x86-chromebook+kselftest-slim'
- 'x86_64_defconfig+x86-chromebook+amdgpu'
fragments: [amdgpu, crypto, ima, x86_kvm_guest, x86-chromebook]
@@ -1097,15 +1101,46 @@ build_configs:
tree: mainline
branch: 'master'
variants:
- gcc-10: *default_gcc-10
+ gcc-10:
+ <<: *default_gcc-10
+ fragments: &fragments-kselftest
+ - 'debug'
+ - 'kselftest'
+ - 'kselftest-slim'
+ - 'tinyconfig'
+
+ architectures:
+ <<: *default_architectures
+ arm64: &arm64_arch-kselftest
+ <<: *arm64_arch
+ extra_configs:
+ - 'allmodconfig'
+ - 'allnoconfig'
+ - 'defconfig+CONFIG_CPU_BIG_ENDIAN=y'
+ - 'defconfig+CONFIG_RANDOMIZE_BASE=y'
+ - 'defconfig+arm64-chromebook+kselftest'
+ - 'defconfig+arm64-chromebook+kselftest-slim'
+ - 'defconfig+arm64-chromebook+videodec'
+
+ x86_64: &x86_64_arch-kselftest
+ <<: *x86_64_arch
+ extra_configs:
+ - 'allmodconfig'
+ - 'allnoconfig'
+ - 'x86_64_defconfig+x86-chromebook+kselftest'
+ - 'x86_64_defconfig+x86-chromebook+kselftest-slim'
+ - 'x86_64_defconfig+x86-chromebook+amdgpu'
+
# Minimum version
clang-11:
build_environment: clang-11
architectures: *arch_clang_configs
+
# Latest stable release
clang-16:
build_environment: clang-16
architectures: *arch_clang_configs
+
rustc-1.62:
build_environment: rustc-1.62
fragments: [rust, rust-samples, kselftest]
@@ -1150,10 +1185,10 @@ build_configs:
variants:
gcc-10:
build_environment: gcc-10
- fragments: *default_fragments
+ fragments: *fragments-kselftest
architectures:
i386: *i386_arch
- x86_64: *x86_64_arch
+ x86_64: *x86_64_arch-kselftest
mips: *mips_arch
riscv: *riscv_arch
sparc: *sparc_arch
@@ -1168,6 +1203,7 @@ build_configs:
- 'defconfig+CONFIG_CPU_BIG_ENDIAN=y'
- 'defconfig+CONFIG_RANDOMIZE_BASE=y'
- 'defconfig+arm64-chromebook+kselftest'
+ - 'defconfig+arm64-chromebook+kselftest-slim'
arm:
base_defconfig: 'multi_v7_defconfig'
extra_configs:
|
Then with the change above, other build configs still only build the slim ones since it's the only one listed in the default fragments:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the comment above, using kselftest-slim
by default and only enabling the full kselftest
builds in mainline and next.
I can't see any sensible way of adding some configs to something using build_configs_defaults without copying the entire thing for most architectures? They all add extra_configs and AFAICT you can't then extend extra_configs, you have to cut'n'paste the whole thing. It's really hard to understand where or how to make any changes, there's a bunch of things layered and partially replaced/filtered. |
Well I think the comments above show how to do that, right? |
Your comment raced with mine. If that's what you want I guess that's what you want - it's all making things really hard to follow to my eyes, stuff like that is why it's hard to tell how to do things, there's a lot of cut'n'paste and backreferences. |
Yes it's a complex configuration file, because there's some complex configuration data. The command line to list the configs is an easy way to check it's producing what's expected. I don't want anything in particular, just saying that adding one kselftest build everywhere by default isn't a great idea so narrowing it down to only mainline and linux-next should provide about the same runtime coverage with a much smaller build footprint. |
That doesn't seem to be working properly, the fragements for GCC 10 seem to be being ignored. |
Guess if you don't want to fix it then we'll just close it and stop running kselftest on boards that can't load the kernel, and create an issue for someone else to have a look? |
I've pushed what I've got, I'm not sure if it's to do with needing to have some other environment set up locally or something. Some of the fragments are appearing but not all. Please understand that there's huge numbers of sharp edges with this stuff, it's not a question of not wanting to do stuff it's a question of having the bandwidth to work out what's going on based on the minimal guidance available - in this case it looks like the main thing I wasn't seeing was that the enormous cut'n'paste is considered tasteful enough. The tools to inspect what happens aren't the issue, it's trying to figure out the logic behind the YAML. |
Of course, it could be that you can't spend more time on it because it's not user-friendly and too time-consuming. Whatever the reason, we need to get PRs resolved and either someone has to fix these issues they get closed. Finding simpler alternatives like disabling tests that don't run and creating issues to keep track of what needs to be improved is also a good way of making progress and reducing the PR backlog. We can't wait another few months until we have the new YAML configuration format and associated CLI tools e.g. |
Some more guidance would have really helped here, for example a pointer that the config language can't really cope here would have helped a lot here (or just generally some sort of pointer when I said I couldn't see how to express this in the filter language). Please also consider that there's been a massive change in pace recently with how quickly things are looked at, that does have knock on effects on expectations as a submitter. |
Yeah basically in YAML you can override dictionaries using anchors but not lists. |
Adding |
Why is this getting skipped rather than merged? The inability to run kselftest configs is holding up a whole bunch of other stuff. Usually whatever got pushed last get stuck behind what got merged first. |
There was just no verification of the results on staging. I can review it again next week. |
The kselftest fragment is a constant source of problems booting boards since it ends up being very much larger than standard configurations which cause lots of problems for u-boot boards, limiting our ability to run the selftests and creating constant overhead. This mainly comes from the cpufreq and LKDTM fragments, they turn on some options which instrument the entire kernel which adds overhead everywhere. Add a new special Kconfig fragment type kselftest-slim which merges all fragments other than those two. For x86_64 defconfig this produces a substantial improvement in image size, bloat-o-meter reports: Total: Before=38073475, After=24629302, chg -35.31% for kselftest and kselftest-slim. If other fragments start causing similar issues we can add them to the list filtered out by kselftest-slim. Signed-off-by: Mark Brown <[email protected]>
Add kselftest-slim builds everywhere we build kselftest. Signed-off-by: Mark Brown <[email protected]>
…lftest The smaller kselftest-slim configs boot much more easily on u-boot platforms which aren't able to automatically place images so have fixed size regions they download binaries to so use them as the default configuration for everything except cpufreq and LKDTM which are not included in the slim configuration. Signed-off-by: Mark Brown <[email protected]>
OK thanks for the rebase, I'll see which devices in the Collabora lab can be used to test the full kselftest build and put a comment here. |
Finally got round to discussing this with the Collabora lab folks, will come back with a summary soon. Thanks for your patience :) |
Adding skip label because of #1994 |
@broonie |
We struggle to run kselftest on a lot of u-boot platforms since the kselftest builds produce a kernel which is much larger than usual, and some of the emulated platforms can struggle at runtime since they are also slower than usual. The increase in size is mostly due to only two of the config fragments, those for cpufreq and LKDTM. These add options that instrument the entire kernel, inflating the size of the resulting image substantially.
While we can address the configuration of individual boards this is a constant burden and we still have the overhead of downloading the larger images and slower execution caused by the instrumentation to deal with. Instead let's define a second kselftest fragment which excludes the problematic fragments and use that unless we need the extra fragments.
This does mean we end up doing an extra build whenever we build kselftest but the improved test coverage and runtime performance seems worth it.